Feature Selection for Regression Problems
نویسنده
چکیده
Feature subset selection is the process of identifying and removing from a training data set as much irrelevant and redundant features as possible. This reduces the dimensionality of the data and may enable regression algorithms to operate faster and more effectively. In some cases, correlation coefficient can be improved; in others, the result is a more compact, easily interpreted representation of the target concept. This paper compares five well-known wrapper feature selection methods. Experimental results are reported using four well known representative regression algorithms.
منابع مشابه
An Overview of the New Feature Selection Methods in Finite Mixture of Regression Models
Variable (feature) selection has attracted much attention in contemporary statistical learning and recent scientific research. This is mainly due to the rapid advancement in modern technology that allows scientists to collect data of unprecedented size and complexity. One type of statistical problem in such applications is concerned with modeling an output variable as a function of a sma...
متن کاملOptimal Feature Selection for Data Classification and Clustering: Techniques and Guidelines
In this paper, principles and existing feature selection methods for classifying and clustering data be introduced. To that end, categorizing frameworks for finding selected subsets, namely, search-based and non-search based procedures as well as evaluation criteria and data mining tasks are discussed. In the following, a platform is developed as an intermediate step toward developing an intell...
متن کاملOptimal Feature Selection for Data Classification and Clustering: Techniques and Guidelines
In this paper, principles and existing feature selection methods for classifying and clustering data be introduced. To that end, categorizing frameworks for finding selected subsets, namely, search-based and non-search based procedures as well as evaluation criteria and data mining tasks are discussed. In the following, a platform is developed as an intermediate step toward developing an intell...
متن کاملComprehensive causal analysis of occupational accidents’ severity in the chemical industries; A field study based on feature selection and multiple linear regression techniques
Introduction: The causal analysis of occupational accidents’ severity in the chemical industries may improve safety design programs in these industries. This comprehensive study was implemented to analyze the factors affecting occupational accidents’ severity in the chemical industries. Methods and Materials: An analytical study was conducted in 22 chemical industries during 2016-2017. The stu...
متن کاملسودمندی رگرسیونهای تجمیعی و روشهای انتخاب متغیرهای پیشبین بهینه در پیشبینی بازده سهام
مقاله حاضر به بررسی سودمندی رگرسیونهای تجمیعی و روشهای انتخاب متغیرهای پیشبین بهینه (شامل روش مبتنی بر همبستگی و ریلیف) برای پیشبینی بازده سهام شرکتهای پذیرفته شده در بورس اوراق بهادار تهران میپردازد. بهمنظور ارزیابی عملکرد رگرسیون تجمیعی، معیارهای ارزیابی (شامل میانگین قدرمطلق درصد خطا، مجذور مربع میانگین خطا و ضریب تعیین) مربوط به پیشبینی این روش، با رگرسیون خطی و شبکههای عصبی مصنوعی...
متن کاملFuzzy-rough Information Gain Ratio Approach to Filter-wrapper Feature Selection
Feature selection for various applications has been carried out for many years in many different research areas. However, there is a trade-off between finding feature subsets with minimum length and increasing the classification accuracy. In this paper, a filter-wrapper feature selection approach based on fuzzy-rough gain ratio is proposed to tackle this problem. As a search strategy, a modifie...
متن کامل